feat: Support QLoRA fine-tuning in post-training by RexBearIU · Pull Request #3702 · AI-Hypercomputer/maxtext

RexBearIU · 2026-04-20T10:35:35Z

Description

This PR extends our existing LoRA integration to support Quantized LoRA (QLoRA) using the Qwix library for our NNX-based models. It adds configuration options for quantization types and tile sizes, along with necessary logic
fixes to correctly handle quantized node metadata and parameter traversal within the NNX framework.

Key Changes

Config Additions: Added lora_weight_qtype (e.g., nf4) and lora_tile_size to sft.yml and the LoRA type class to enable QLoRA configurations.
NNX Decoder Enhancements:
- Added metadata preservation (stash_origin_metadata and restore_origin_metadata) to correctly handle partitioning specs across scan boundaries in NNXDecoder.
- Introduced fix_node_rank logic to dynamically adjust PartitionSpec rank constraints to match parameter shapes during scanning.
Qwix Provider Patching (lora_utils.py):
- Automatically switches between LoRA and QLoRA providers based on the presence of lora_weight_qtype.
- Implemented _patch_qwix_for_maxtext to resolve integration issues:
  - PTQ Patch: Intercepts jax.numpy.asarray to correctly handle nnx.State arrays wrapped as ptq.QArray objects.
  - Parameter Traversal Patch (find_param): Replaces flax_util.find_param with a custom implementation that accurately searches nnx.Module trees and jax.core.Tracer graphs to find the correct node references for
    quantization.

Tests

Updated lora_utils_test.py to cover the new optional QLoRA config fields.

Checklist

Before submitting this PR, please make sure (put X in square brackets):

I have performed a self-review of my code. For an optional AI review, add the gemini-review label.
I have necessary comments in my code, particularly in hard-to-understand areas.
I have run end-to-end tests tests and provided workload links above if applicable.
I have made or will make corresponding changes to the doc if needed, including adding new documentation pages to the relevant Table of Contents (toctree directive) as explained in our documentation.

codecov · 2026-05-05T08:28:02Z

Codecov Report

❌ Patch coverage is 69.69697% with 20 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
src/maxtext/utils/maxtext_utils_nnx.py	66.07%	8 Missing and 11 partials ⚠️
src/maxtext/layers/nnx_decoders.py	87.50%	0 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

…utils instead)

RexBearIU closed this Apr 20, 2026

RexBearIU force-pushed the jackyf/qlora branch from 670e85c to f5736a1 Compare April 20, 2026 11:03

RexBearIU reopened this Apr 21, 2026

RexBearIU force-pushed the jackyf/feat/lora-nnx branch 3 times, most recently from 669b501 to 9634d50 Compare April 21, 2026 08:48

RexBearIU force-pushed the jackyf/qlora branch 2 times, most recently from a2739ab to fc27813 Compare April 22, 2026 08:01

RexBearIU force-pushed the jackyf/feat/lora-nnx branch 22 times, most recently from 8531d67 to de940c3 Compare April 29, 2026 12:28

RexBearIU force-pushed the jackyf/qlora branch 8 times, most recently from 2d237f8 to f8a7f1b Compare May 4, 2026 11:06

RexBearIU force-pushed the jackyf/feat/lora-nnx branch from 5a7bedb to 397f319 Compare May 5, 2026 08:03

RexBearIU force-pushed the jackyf/qlora branch from f8a7f1b to aa36bb9 Compare May 5, 2026 08:17

RexBearIU changed the title ~~Jackyf/qlora~~ feat: Support QLoRA fine-tuning in post-training May 5, 2026

RexBearIU force-pushed the jackyf/qlora branch from aa36bb9 to 324eb0a Compare May 6, 2026 11:06

RexBearIU force-pushed the jackyf/feat/lora-nnx branch 5 times, most recently from 391664f to 1eb5953 Compare May 8, 2026 11:11

shralex force-pushed the jackyf/feat/lora-nnx branch 2 times, most recently from 3954ce8 to 6face5b Compare May 8, 2026 15:01

RexBearIU force-pushed the jackyf/feat/lora-nnx branch from 6face5b to eced9d7 Compare May 11, 2026 10:20

RexBearIU added 9 commits May 14, 2026 07:34

feat: qlora config updates and sharding metadata refactoring

e192c2e

chore: add NNX utility tests and update LoRA module paths for gemma4

a3dc968

fix: update gemma4 LoRA module path to match MoE structure

2b0ef27

fix: update gemma4 LoRA module path with verified NNX structure

15f7d6e

docs: update lora_module_path examples for NNX and MoE architectures

8907992

fix: robust logical_to_mesh_sharding to handle tuple/list metadata

6a0125d

refactor: use native flax logical_to_mesh_sharding in lora_utils

3ca1125

revert: robustness changes in sharding.py (using native flax in lora_…

2c84016

…utils instead)

fix: use correct flax.linen import and attribute for sharding

322ccaa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support QLoRA fine-tuning in post-training#3702

feat: Support QLoRA fine-tuning in post-training#3702
RexBearIU wants to merge 9 commits into
mainfrom
jackyf/qlora

RexBearIU commented Apr 20, 2026 •

edited

Loading

Uh oh!

codecov Bot commented May 5, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

RexBearIU commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Tests

Checklist

Uh oh!

codecov Bot commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

RexBearIU commented Apr 20, 2026 •

edited

Loading

codecov Bot commented May 5, 2026 •

edited

Loading